Editor's note: Morris Wilburn is a marketing research consultant at Advanced Customer Analytics, Lawrenceville, Ga.
Most studies using maximum difference scaling (max-diff) contain too many questions for any one respondent to answer. This is often dealt with by randomly drawing a subset of questions for each respondent. At first thought, this seems like a reasonable solution; in many other areas of research we select or assign things randomly to avoid bias. But it is not appropriate here. It yields interviews in which, at the respondent level, some attributes are seen much more frequently than others, and some are not seen at all. This occurs even when the attributes are perfectly balanced at the total sample level. This negatively affects the outcome of the analysis.
Consider the following. A typical design for a max-diff study is found in Exhibit 1. Using classical experimental design terminology, this is an incomplete block design. Each row is a question. The columns contain codes representing the attributes; there are 21 attributes, coded “1” to “21.” Each question contains five attributes (alternatives).
Incomplete block designs are constructed to be as balanced as possible. By definition, a design is perfectly balanced if it has two characteristics:
- every attribute appears an equal number of times across questions; and
- every attribute appears with each other attribute an equal number of times across questions.
Exhibit 1 – Typical Max-Diff Design
Question | 1 | 2 | 3 | 4 | 5 |
1 | 7 | 9 | 10 | 13 | 21 |
2 | 2 | 6 | 9 | 12 | 14 |
3 | 1 | 4 | 6 | 8 | 13 |
4 | 1 | 7 | 9 | 15 | 18 |
5 | 10 | 12 | 13 | 16 | 18 |
6 | 6 | 11 | 12 | 17 | 21 |
7 | 3 | 10 | 14 | 17 | 20 |
8 | 1 | 8 | 10 | 12 | 19 |
9 | 1 | 9 | 11 | 12 | 21 |
10 | 2 | 4 | 5 | 11 | 20 |
11 | 3 | 4 | 9 | 13 | 17 |
12 | 3 | 6 | 9 | 19 | 20 |
13 | 4 | 5 | 14 | 18 | 21 |
14 | 8 | 11 | 13 | 14 | 16 |
15 | 2 | 5 | 8 | 12 | 17 |
16 | 2 | 6 | 7 | 13 | 14 |
17 | 1 | 3 | 5 | 11 | 14 |
18 | 3 | 6 | 8 | 17 | 18 |
19 | 1 | 5 | 7 | 17 | 21 |
20 | 1 | 4 | 6 | 10 | 16 |
21 | 1 | 2 | 15 | 16 | 20 |
22 | 2 | 4 | 9 | 16 | 19 |
23 | 3 | 7 | 12 | 16 | 20 |
24 | 6 | 7 | 11 | 18 | 20 |
25 | 4 | 7 | 12 | 14 | 15 |
26 | 2 | 7 | 8 | 10 | 14 |
27 | 5 | 6 | 15 | 16 | 21 |
28 | 11 | 13 | 15 | 17 | 19 |
29 | 3 | 4 | 12 | 15 | 18 |
30 | 7 | 11 | 16 | 17 | 19 |
31 | 5 | 9 | 10 | 11 | 18 |
32 | 1 | 3 | 5 | 7 | 13 |
33 | 8 | 13 | 15 | 20 | 21 |
34 | 3 | 14 | 16 | 19 | 21 |
35 | 5 | 6 | 10 | 15 | 19 |
36 | 2 | 13 | 18 | 19 | 21 |
37 | 8 | 9 | 14 | 15 | 17 |
38 | 1 | 14 | 18 | 19 | 20 |
39 | 2 | 3 | 10 | 11 | 15 |
40 | 4 | 10 | 17 | 20 | 21 |
41 | 5 | 8 | 9 | 16 | 20 |
42 | 4 | 7 | 8 | 11 | 19 |
43 | 1 | 2 | 16 | 17 | 18 |
44 | 2 | 3 | 8 | 18 | 21 |
45 | 5 | 12 | 13 | 19 | 20 |
The number of times each attribute appears across the 45 questions is shown in Exhibit 2.
Exhibit 2 – Frequency Distribution of Attributes Across Questions
Attribute | Frequency |
1 | 11 |
2 | 11 |
3 | 11 |
4 | 10 |
5 | 11 |
6 | 10 |
7 | 11 |
8 | 11 |
9 | 10 |
10 | 10 |
11 | 11 |
12 | 10 |
13 | 11 |
14 | 11 |
15 | 10 |
16 | 11 |
17 | 11 |
18 | 11 |
19 | 11 |
20 | 11 |
21 | 11 |
Let’s look at an example. The attribute coded “1” appears 11 times across the 45 questions; the attribute coded “4” appears 10 times across the 45 questions.
This design is not perfectly balanced but it is close. (For technical readers, the highest correlation between attributes is .08.)
Forty-five questions are far too many for any one respondent to answer. A common solution is to take the design shown above and program the interview to randomly draw a subset of questions for each respondent. To examine the appropriateness of this practice, I have drawn 10 independent random samples of 10 questions from this design. They are shown in Exhibit 3. I decided on 10 questions, as opposed to a larger number, because of the increasing use of smartphones to answer questionnaires.
Exhibit 3 – Frequency Distribution of Attributes Within Random Samples of Questions
Attribute | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1 | 1 | 2 | 1 | 2 | 3 | 1 | 1 | 1 | 1 | 3 |
2 | 3 | 2 | 1 | 2 | 3 | 3 | 4 | 2 | 3 | 3 |
3 | 3 | 2 | 4 | 1 | 0 | 3 | 1 | 2 | 5 | 2 |
4 | 3 | 1 | 1 | 3 | 3 | 3 | 0 | 4 | 5 | 1 |
5 | 2 | 1 | 2 | 2 | 2 | 1 | 1 | 3 | 1 | 3 |
6 | 3 | 2 | 1 | 0 | 1 | 1 | 3 | 3 | 1 | 2 |
7 | 3 | 2 | 1 | 4 | 1 | 2 | 3 | 3 | 1 | 4 |
8 | 5 | 2 | 3 | 1 | 3 | 1 | 4 | 0 | 3 | 5 |
9 | 3 | 3 | 2 | 4 | 4 | 1 | 2 | 2 | 3 | 2 |
10 | 1 | 3 | 2 | 1 | 3 | 5 | 0 | 3 | 2 | 3 |
11 | 1 | 4 | 1 | 3 | 3 | 1 | 3 | 3 | 1 | 2 |
12 | 3 | 2 | 4 | 3 | 1 | 2 | 2 | 3 | 2 | 3 |
13 | 2 | 5 | 3 | 2 | 3 | 1 | 3 | 2 | 2 | 2 |
14 | 2 | 3 | 2 | 3 | 2 | 3 | 5 | 1 | 2 | 4 |
15 | 4 | 2 | 3 | 3 | 1 | 2 | 3 | 2 | 3 | 2 |
16 | 1 | 3 | 3 | 3 | 4 | 5 | 4 | 3 | 3 | 2 |
17 | 3 | 3 | 2 | 3 | 3 | 4 | 3 | 3 | 3 | 1 |
18 | 3 | 1 | 4 | 2 | 2 | 3 | 1 | 1 | 2 | 1 |
19 | 1 | 3 | 2 | 4 | 1 | 4 | 2 | 4 | 1 | 4 |
20 | 1 | 1 | 4 | 0 | 3 | 2 | 3 | 3 | 4 | 0 |
21 | 2 | 3 | 4 | 4 | 4 | 2 | 2 | 2 | 2 | 1 |
In the first sample drawn, the attribute coded “1” appears once across the 10 questions; in the second sample drawn, the attribute coded “2” appears twice across the 10 questions.
In all 10 interviews the attributes differ to a substantial degree in the frequency with which they appear. In five of the interviews, one or two attributes are never seen by the respective respondent.
Obviously, the trade-off philosophy of max-diff has been compromised. But what are the practical implications of this? A fundamental consequence is that a max-diff design implemented in this manner is rendered inefficient at the respondent level. That is, the questions collectively do not obtain as much information about preference as they could. This is relevant when the max-diff data are analyzed in a way that uses individual respondents’ data to calculate respondent-level data (e.g., Hierarchical Bayes), which is then used in a segmentation analysis. The outcome of the segmentation will be negatively affected by this inefficiency.
Another issue is that data collected in this manner is biased. There are two aspects of this bias. One is that the mere fact that a given attribute appears more often making it more likely to be chosen as most or least important. Admittedly, the direction of this bias differs from one respondent to another, and so the biases will balance each other out to some degree at the aggregate level. But this bias will still be treated as unexplained variation when the data are analyzed, making the analysis results less precise.
The other aspect is that if several of the attributes collectively appear a relatively large number of times, they function as a standard of comparison in the respondent’s decision of how to answer the questions.
Internally balanced groups
A better method of allocating the 45 questions is to use experimental design techniques to divide the questions into internally balanced groups, and randomly assign one group to each respondent. To illustrate, I have divided the 45 questions into five groups of nine questions. The results are shown in Exhibit 4.
Exhibit 4 – Frequency Distribution of Attributes Within Internally Balanced Groups of Questions
Attribute | 1 | 2 | 3 | 4 | 5 |
1 | 3 | 2 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 3 | 2 |
3 | 2 | 3 | 2 | 2 | 2 |
4 | 2 | 2 | 2 | 1 | 3 |
5 | 3 | 2 | 3 | 2 | 1 |
6 | 2 | 2 | 2 | 2 | 2 |
7 | 2 | 2 | 2 | 2 | 3 |
8 | 2 | 2 | 2 | 3 | 2 |
9 | 2 | 1 | 3 | 2 | 2 |
10 | 2 | 2 | 2 | 2 | 2 |
11 | 2 | 2 | 3 | 2 | 2 |
12 | 3 | 2 | 1 | 2 | 2 |
13 | 2 | 2 | 2 | 3 | 2 |
14 | 1 | 3 | 2 | 2 | 3 |
15 | 2 | 2 | 2 | 2 | 2 |
16 | 2 | 2 | 3 | 2 | 2 |
17 | 2 | 2 | 2 | 2 | 3 |
18 | 2 | 3 | 2 | 2 | 2 |
19 | 2 | 3 | 2 | 2 | 2 |
20 | 2 | 2 | 2 | 3 | 2 |
21 | 3 | 2 | 2 | 2 | 2 |
To avoid position bias, the alternatives within question and the questions should be rotated from one respondent to another in the interview.
It is not mathematically possible to achieve perfect internal balance in this case. But this grouping of questions is much more balanced than that produced by the random selection approach discussed earlier.